Structured Models for Audio Content Analysis Ph.D. Thesis Proposal

نویسندگان

Sourish Chaudhuri

Rita Singh

Jaime Carbonell

Dan Ellis

چکیده

The ability to automatically analyze audio content is a key aspect of information retrieval systems that deal with multimodal files. The unprecedented growth of web-based user generated content-sharing platforms and their popularity has led to research efforts attempting to understand the content of such files. Typically, audio analysis research has focussed on some specific tasks – detection of specific types of sounds, classification of the content into categories , and summarizing the content of an audio file. These approaches involved working individually on small segments of audio using supervised methods to detect patterns of interest. The main hypothesis that drives this dissertation is that sound has its own language and structure and can be modeled using sequences of lower level units (which we refer to as acoustic unit descriptors). The lower level units may not carry semantic information individually, but the sequences or distribution of these units should capture semantic information. In this language for sounds, the lower level units alluded to would be analogous to the alphabet. Such a representation of sound using a discrete sequence lends itself naturally to a hierarchical structure, where sequences of these lower level units can be mapped to real events, that have clear semantic interpretations. Further, these event sequences themselves should carry information about the overall semantic content or class of the audio. Depending on the restrictions we enforce at various levels of this structure, we can use such structured models to classify or detect sound types, segment files as a sequence of semantically meaningful sound types, or predict associated sound classes. In this proposal, we first summarize our prior work that describes the process of learning of the lower level acoustic unit descriptors in an unsupervised manner from audio data. We demonstrate empirically that the learnt acoustic unit descriptors appear to capture semantic information, and that they can outperform other plausible semantically motivated schemes. We then discuss the proposed directions of research in this dissertation, including techniques that attempt to discover further structural relationships between sequences of these acoustic unit descriptors, or the event layer that lies above them. Our approach to discovering the hidden structure proposes to leverage the large amount of unlabeled and coarsely labeled data, using techniques inspired by semi-supervised and multi-instance learning approaches. The research pursued in this dissertation will demonstrate that hidden semantic structure can be automatically discovered from weakly-labeled audio data. The use of such semantically informed features would enable audio analysis to improve significantly over the state-of-the-art.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تبیین انتظارات اساتید دانشگاه علوم پزشکی ایران از دانشجویان دکتری در روند انجام رساله

Background: Knowing the expectations of supervisors may affect the quality of graduate students' theses. The aim of this study was to explore expectations of supervisors from Ph.D students in the process of performing Ph.D thesis as a qualitative content analysis design (conventional method). Methods: This qualitative study was conducted on 25 supervisor of Iran University of Medical Science...

متن کامل

Structuring and Querying Personalized Audio Using Ontologies1 Ph.D. Thesis Proposal

User-customized information selection and delivery reduces the complexity of the overwhelming amount of information available to end-users. Our approach employs user profiles, data selection, and presentation facilities to deliver customized audio information to end-users. Specifically, we construct a domain-dependent ontology (a collection of key concepts and their inter-relationships) to enab...

متن کامل

Thesis Proposal: CRF Autoencoder Models for Structured Prediction with Partial Supervision

متن کامل

Automatic Stencil Code Generation- Ph.D. Thesis Proposal

Stencil-based kernels constitute the core of many scientific applications on block-structured grids. These calculations form the basis for a wide range of scientific applications from simple Jacobi iterations to complex multigrid and block structured adaptive PDE solvers. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and ...

متن کامل

Indexation de documents audio : Cas des grands volumes de données

This thesis is devoted to techniques for speaker-based recognition systems to scale up to large amounts of data and speaker models. We have chosen to partition audio documents (news broadcast) according to speakers. The mel-cepstral acoustic characteristics of each speaker are model through a probabilistic Gaussian mixture model. First, speaker change detection in the stream is carried out by B...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Structured Models for Audio Content Analysis Ph.D. Thesis Proposal

نویسندگان

چکیده

منابع مشابه

تبیین انتظارات اساتید دانشگاه علوم پزشکی ایران از دانشجویان دکتری در روند انجام رساله

Structuring and Querying Personalized Audio Using Ontologies1 Ph.D. Thesis Proposal

Thesis Proposal: CRF Autoencoder Models for Structured Prediction with Partial Supervision

Automatic Stencil Code Generation- Ph.D. Thesis Proposal

Indexation de documents audio : Cas des grands volumes de données

عنوان ژورنال:

اشتراک گذاری